NOTE: This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
In last few practices/sessions, you learned about spatial point patterns. The next few sessions will concentrate on area data.
For this practice you will need the following:
This dataset includes the spatial information for the census tracts in the Hamilton Census Metropolitan Area (as polygons), and a host of demographic variables from the census of Canada, including population and languages.
In this practice, you will learn:
O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapter 7. John Wiley & Sons: New Jersey.
As usual, it is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:
rm(list = ls())
Note that ls() lists all objects currently on the worspace.
Load the libraries you will use in this activity:
library(tidyverse)
package <U+393C><U+3E31>tidyverse<U+393C><U+3E32> was built under R version 3.4.3-- Attaching packages --------------------------------------------- tidyverse 1.2.1 --
v ggplot2 2.2.1 v purrr 0.2.4
v tibble 1.4.1 v dplyr 0.7.4
v tidyr 0.7.2 v stringr 1.2.0
v readr 1.1.1 v forcats 0.2.0
package <U+393C><U+3E31>ggplot2<U+393C><U+3E32> was built under R version 3.4.2package <U+393C><U+3E31>tibble<U+393C><U+3E32> was built under R version 3.4.3package <U+393C><U+3E31>tidyr<U+393C><U+3E32> was built under R version 3.4.3package <U+393C><U+3E31>purrr<U+393C><U+3E32> was built under R version 3.4.3package <U+393C><U+3E31>dplyr<U+393C><U+3E32> was built under R version 3.4.3package <U+393C><U+3E31>forcats<U+393C><U+3E32> was built under R version 3.4.3-- Conflicts ------------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(rgdal)
Loading required package: sp
rgdal: version: 1.2-11, (SVN revision 676)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.2.0, released 2017/04/28
Path to GDAL shared files: C:/Users/Antonio/Documents/R/win-library/3.4/rgdal/gdal
Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
Path to PROJ.4 shared files: C:/Users/Antonio/Documents/R/win-library/3.4/rgdal/proj
Linking to sp version: 1.2-5
library(broom)
library(plotly)
package <U+393C><U+3E31>plotly<U+393C><U+3E32> was built under R version 3.4.3
Attaching package: <U+393C><U+3E31>plotly<U+393C><U+3E32>
The following object is masked from <U+393C><U+3E31>package:ggplot2<U+393C><U+3E32>:
last_plot
The following object is masked from <U+393C><U+3E31>package:stats<U+393C><U+3E32>:
filter
The following object is masked from <U+393C><U+3E31>package:graphics<U+393C><U+3E32>:
layout
library(cartogram)
package <U+393C><U+3E31>cartogram<U+393C><U+3E32> was built under R version 3.4.3
library(gridExtra)
package <U+393C><U+3E31>gridExtra<U+393C><U+3E32> was built under R version 3.4.3
Attaching package: <U+393C><U+3E31>gridExtra<U+393C><U+3E32>
The following object is masked from <U+393C><U+3E31>package:dplyr<U+393C><U+3E32>:
combine
Read the data that you will use for this practice. This is an Esri shape file that will be saved as an object of class SpatialPolygonDataFrame. The function used to read Esri shape files is rgdal::readOGR. Setting integer64 to “allow.loss” keeps the data as integers as opposed to changing to factors or strings:
Hamilton_CT <- readOGR(".", layer = "Hamilton CMA CT", integer64 = "allow.loss")
OGR data source with driver: ESRI Shapefile
Source: ".", layer: "Hamilton CMA CT"
with 188 features
It has 255 fields
Integer64 fields read as signed 32-bit integers: ID POPULATION PRIVATE_DW OCCUPIED_D ALL_AGES AGE_4 AGE_5_TO_9 AGE_10_TO_ AGE_15_TO_ AGE_15 AGE_16 AGE_17 AGE_18 AGE_19 AGE_20_TO_ AGE_25_TO_ AGE_30_TO_ AGE_35_TO_ AGE_40_TO_ AGE_45_TO_ AGE_50_TO_ AGE_55_TO_ AGE_60_TO_ AGE_65_TO_ AGE_70_TO_ AGE_75_TO_ AGE_80_TO_ AGE_85 MEDIAN_AGE MALE_ALL_A MALE_4 MALE_5_TO_ MALE_10_TO MALE_15_TO MALE_15 MALE_16 MALE_17 MALE_18 MALE_19 MALE_20_TO MALE_25_TO MALE_30_TO MALE_35_TO MALE_40_TO MALE_45_TO MALE_50_TO MALE_55_TO MALE_60_TO MALE_65_TO MALE_70_TO MALE_75_TO MALE_80_TO MALE_85 MALE_MEDIA FEMALE_ALL FEMALE_4 FEMALE_5_T FEMALE_10_ FEMALE_15_ FEMALE_15 FEMALE_16 FEMALE_17 FEMALE_18 FEMALE_19 FEMALE_20_ FEMALE_25_ FEMALE_30_ FEMALE_35_ FEMALE_40_ FEMALE_45_ FEMALE_50_ FEMALE_55_ FEMALE_60_ FEMALE_65_ FEMALE_70_ FEMALE_75_ FEMALE_80_ FEMALE_85 FEMALE_MED MARRIED_AG MARRIED_OR MARRIED COMMON_LAW UNMARRIED SINGLE SEPARATED DIVORCED WIDOWED MARRIED_A1 MARRIED_O1 MARRIED_M COMMON_LA1 UNMARRIED_ SINGLE_M SEPARATED_ DIVORCED_M WIDOWED_M MARRIED_A2 MARRIED_O2 MARRIED_F COMMON_LA2 UNMARRIED1 SINGLE_F SEPARATED1 DIVORCED_F WIDOWED_F FAMILIES_I FAMILY_SIZ FAMILY_SI1 FAMILY_SI2 FAMILY_SI3 COUPLE_FAM COUPLE_MAR COUPLE_MA1 COUPLE_MA2 COUPLE_MA3 COUPLE_MA4 COUPLE_MA5 COUPLE_COM COUPLE_CO1 COUPLE_CO2 COUPLE_CO3 COUPLE_CO4 COUPLE_CO5 SINGLE_PAR SINGLE_PA1 SINGLE_PA2 SINGLE_PA3 SINGLE_PA4 SINGLE_PA5 SINGLE_PA6 SINGLE_PA7 SINGLE_PA8 CHILDREN_F CHILDREN_1 CHILDREN_2 CHILDREN_3 CHILDREN_4 CHILDREN_5 POPULATIO1 POPULATIO2 POPULATIO3 POPULATIO4 POPULATIO5 POPULATIO6 POPULATIO7 POPULATIO8 POPULATIO9 POPULATI10 POPULATI11 POPULATI12 PRIVATE_HO PRIVATE_HH PRIVATE_H1 PRIVATE_H2 PRIVATE_H3 PRIVATE_H4 PRIVATE_H5 PRIVATE_H6 PRIVATE_H7 PRIVATE_H8 PRIVATE_H9 PRIVATE_10 PRIVATE_11 PRIVATE_12 PRIVATE_13 PRIVATE_14 PRIVATE_15 OCC_PRIVAT OCC_PRIVA1 OCC_PRIVA2 OCC_PRIVA3 OCC_PRIVA4 OCC_PRIVA5 OCC_PRIVA6 OCC_PRIVA7 OCC_PRIVA8 OCC_PRIVA9 PRIVATE_16 PRIVATE_17 PRIVATE_18 PRIVATE_19 PRIVATE_20 PRIVATE_21 PRIVATE_22 PRIVATE_23 NATIVE_LAN NATIVE_LA1 NATIVE_LA2 NATIVE_LA3 NATIVE_LA4 NATIVE_LA5 NATIVE_LA6 NATIVE_LA7 NATIVE_LA8 NATIVE_LA9 NATIVE_L10 NATIVE_L11 NATIVE_L12 NATIVE_L13 NATIVE_L14 NATIVE_L15 NATIVE_L16 NATIVE_L17 NATIVE_L18 NATIVE_L19 NATIVE_L20 NATIVE_L21 NATIVE_L22 NATIVE_L23 NATIVE_L24 NATIVE_L25 NATIVE_L26 NATIVE_L27 NATIVE_L28 NATIVE_L29 NATIVE_L30 NATIVE_L31 NATIVE_L32 NATIVE_L33 NATIVE_L34 NATIVE_L35 NATIVE_L36 NATIVE_L37 NATIVE_L38 NATIVE_L39 NATIVE_L40 NATIVE_L41 NATIVE_L42 NATIVE_L43 NATIVE_L44 NATIVE_L45 NATIVE_L46 NATIVE_L47 NATIVE_L48 NATIVE_L49 NATIVE_L50 NATIVE_L51 NATIVE_L52 NATIVE_L53 NATIVE_L54 NATIVE_L55 NATIVE_L56 NATIVE_L57 NATIVE_L58 NATIVE_L59
To use the plotting functions of ggplot2, the SpatialPolygonDataFrame needs to be “tidied” by means of the tidy function of the broom package:
Hamilton_CT.t <- tidy(Hamilton_CT, region = "TRACT")
Hamilton_CT.t <- dplyr::rename(Hamilton_CT.t, TRACT = id)
Tidying the spatial dataframe strips it from the non-spatial information, but we can add all the data by means of the left_join function:
Hamilton_CT.t <- left_join(Hamilton_CT.t, Hamilton_CT@data, by = "TRACT")
Column `TRACT` joining character vector and factor, coercing into character vector
Now the tidy dataframe Hamilton_DA.t contains the spatial information and the data.
You can quickly verify the contents of the dataframe by means of summary:
summary(Hamilton_CT.t)
long lat order hole piece
Min. :-80.25 Min. :43.05 Min. : 1 Mode :logical 1:29212
1st Qu.:-79.93 1st Qu.:43.21 1st Qu.: 7304 FALSE:29212
Median :-79.86 Median :43.24 Median :14606
group TRACT ID AREA
5370124.00.1: 822 Length:29212 Min. : 919807 Min. : 0.3154
5370121.00.1: 661 Class :character 1st Qu.: 920233 1st Qu.: 1.2217
5370142.01.1: 642 Mode :character Median : 937830 Median : 2.6824
COLORING CMA PROVINCE NAME ABBREV POPULATION
Min. :0.000 537:29212 00:14311 0124.00: 822 ON:29212 Min. : 5
1st Qu.:0.000 01: 9506 0121.00: 661 1st Qu.: 2756
Median :2.000 02: 4394 0142.01: 642 Median : 3901
PRIVATE_DW OCCUPIED_D LAND_AREA POP_DENSIT ALL_AGES
Min. : 0 Min. : 0 Min. : 0.32 Min. : 2.591 Min. : 0
1st Qu.:1191 1st Qu.:1170 1st Qu.: 1.22 1st Qu.: 254.658 1st Qu.: 2755
Median :1526 Median :1436 Median : 2.62 Median : 1511.957 Median : 3905
AGE_4 AGE_5_TO_9 AGE_10_TO_ AGE_15_TO_ AGE_15
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 115.0 1st Qu.: 125.0 1st Qu.:140.0 1st Qu.:170.0 1st Qu.: 30.00
Median : 175.0 Median : 195.0 Median :220.0 Median :260.0 Median : 45.00
AGE_16 AGE_17 AGE_18 AGE_19 AGE_20_TO_
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0
1st Qu.: 30.00 1st Qu.: 35.00 1st Qu.: 35.00 1st Qu.: 35.00 1st Qu.:180.0
Median : 55.00 Median : 50.00 Median : 55.00 Median : 55.00 Median :245.0
AGE_25_TO_ AGE_30_TO_ AGE_35_TO_ AGE_40_TO_ AGE_45_TO_
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0
1st Qu.:135.0 1st Qu.: 125.0 1st Qu.: 145.0 1st Qu.: 180.0 1st Qu.:215
Median :205.0 Median : 185.0 Median : 205.0 Median : 245.0 Median :300
AGE_50_TO_ AGE_55_TO_ AGE_60_TO_ AGE_65_TO_ AGE_70_TO_
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.:220.0 1st Qu.:190.0 1st Qu.:160.0 1st Qu.:130.0 1st Qu.: 95.0
Median :310.0 Median :300.0 Median :260.0 Median :185.0 Median :140.0
AGE_75_TO_ AGE_80_TO_ AGE_85 MEDIAN_AGE MALE_ALL_A
Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0
1st Qu.: 75.0 1st Qu.: 50.0 1st Qu.: 35.00 1st Qu.:38.00 1st Qu.:1345
Median :105.0 Median : 85.0 Median : 75.00 Median :43.00 Median :1925
MALE_4 MALE_5_TO_ MALE_10_TO MALE_15_TO MALE_15
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 60.0 1st Qu.: 65.0 1st Qu.: 75.0 1st Qu.: 85.0 1st Qu.:15.00
Median : 90.0 Median :100.0 Median :120.0 Median :130.0 Median :25.00
MALE_16 MALE_17 MALE_18 MALE_19 MALE_20_TO
Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.0
1st Qu.:15.00 1st Qu.: 15.0 1st Qu.: 15.0 1st Qu.: 20.00 1st Qu.: 90.0
Median :25.00 Median : 25.0 Median : 30.0 Median : 30.00 Median :125.0
MALE_25_TO MALE_30_TO MALE_35_TO MALE_40_TO MALE_45_TO
Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0
1st Qu.: 70.0 1st Qu.: 60.0 1st Qu.: 70 1st Qu.: 90.0 1st Qu.:105.0
Median :105.0 Median : 85.0 Median :100 Median :115.0 Median :150.0
MALE_50_TO MALE_55_TO MALE_60_TO MALE_65_TO MALE_70_TO
Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.:110.0 1st Qu.: 95 1st Qu.: 75.0 1st Qu.: 60.00 1st Qu.: 45.00
Median :145.0 Median :140 Median :125.0 Median : 90.00 Median : 65.00
MALE_75_TO MALE_80_TO MALE_85 MALE_MEDIA FEMALE_ALL
Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0
1st Qu.: 35.00 1st Qu.: 20.00 1st Qu.: 15.0 1st Qu.:37.00 1st Qu.:1405
Median : 50.00 Median : 35.00 Median : 25.0 Median :42.00 Median :1920
FEMALE_4 FEMALE_5_T FEMALE_10_ FEMALE_15_ FEMALE_15
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 50.0 1st Qu.: 60.0 1st Qu.: 70.0 1st Qu.: 80.0 1st Qu.:15.00
Median : 85.0 Median : 90.0 Median :105.0 Median :125.0 Median :25.00
FEMALE_16 FEMALE_17 FEMALE_18 FEMALE_19 FEMALE_20_
Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.0
1st Qu.:15.00 1st Qu.: 15.0 1st Qu.: 15.00 1st Qu.: 15.00 1st Qu.: 80.0
Median :25.00 Median : 25.0 Median : 25.00 Median : 25.00 Median :120.0
FEMALE_25_ FEMALE_30_ FEMALE_35_ FEMALE_40_ FEMALE_45_
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.: 65.0 1st Qu.: 65.0 1st Qu.: 75.0 1st Qu.: 90.0 1st Qu.:105.0
Median :100.0 Median : 95.0 Median :105.0 Median :130.0 Median :155.0
FEMALE_50_ FEMALE_55_ FEMALE_60_ FEMALE_65_ FEMALE_70_
Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.:110 1st Qu.: 95.0 1st Qu.: 85.0 1st Qu.: 65.0 1st Qu.: 50.00
Median :160 Median :155.0 Median :135.0 Median : 90.0 Median : 70.00
FEMALE_75_ FEMALE_80_ FEMALE_85 FEMALE_MED MARRIED_AG
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0
1st Qu.: 40.00 1st Qu.: 25.00 1st Qu.: 20.00 1st Qu.:39.00 1st Qu.:2355
Median : 60.00 Median : 45.00 Median : 45.00 Median :43.00 Median :3230
MARRIED_OR MARRIED COMMON_LAW UNMARRIED SINGLE
Min. : 0 Min. : 0 Min. : 0.0 Min. : 0 Min. : 0.0
1st Qu.:1250 1st Qu.:1065 1st Qu.:145.0 1st Qu.:1035 1st Qu.: 660.0
Median :1995 Median :1715 Median :210.0 Median :1335 Median : 840.0
SEPARATED DIVORCED WIDOWED MARRIED_A1 MARRIED_O1
Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0
1st Qu.: 55.00 1st Qu.:110.0 1st Qu.:120.0 1st Qu.:1145 1st Qu.: 620
Median : 85.00 Median :175.0 Median :180.0 Median :1595 Median :1000
MARRIED_M COMMON_LA1 UNMARRIED_ SINGLE_M SEPARATED_
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 530.0 1st Qu.: 70.0 1st Qu.: 475.0 1st Qu.: 355.0 1st Qu.: 25.00
Median : 860.0 Median :105.0 Median : 585.0 Median : 450.0 Median : 35.00
DIVORCED_M WIDOWED_M MARRIED_A2 MARRIED_O2 MARRIED_F
Min. : 0.00 Min. : 0.00 Min. : 0 Min. : 0 Min. : 0.0
1st Qu.: 45.00 1st Qu.: 25.00 1st Qu.:1225 1st Qu.: 625 1st Qu.: 535.0
Median : 60.00 Median : 40.00 Median :1675 Median :1000 Median : 855.0
COMMON_LA2 UNMARRIED1 SINGLE_F SEPARATED1 DIVORCED_F
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.0
1st Qu.: 70.0 1st Qu.: 525.0 1st Qu.: 285.0 1st Qu.: 25.00 1st Qu.: 60.0
Median :105.0 Median : 680.0 Median : 385.0 Median : 55.00 Median :110.0
WIDOWED_F FAMILIES_I FAMILY_SIZ FAMILY_SI1 FAMILY_SI2
Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0 Min. : 0
1st Qu.: 95.0 1st Qu.: 765 1st Qu.: 385.0 1st Qu.:165 1st Qu.:135
Median :140.0 Median :1105 Median : 515.0 Median :235 Median :225
FAMILY_SI3 COUPLE_FAM COUPLE_MAR COUPLE_MA1 COUPLE_MA2
Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0
1st Qu.: 60.0 1st Qu.: 605 1st Qu.: 530.0 1st Qu.: 225.0 1st Qu.: 265
Median : 90.0 Median : 985 Median : 840.0 Median : 360.0 Median : 430
COUPLE_MA3 COUPLE_MA4 COUPLE_MA5 COUPLE_COM COUPLE_CO1
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.:105.0 1st Qu.:115.0 1st Qu.: 50.0 1st Qu.: 70.0 1st Qu.: 40.00
Median :170.0 Median :185.0 Median : 80.0 Median :105.0 Median : 60.00
COUPLE_CO2 COUPLE_CO3 COUPLE_CO4 COUPLE_CO5 SINGLE_PAR
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.0
1st Qu.: 25.00 1st Qu.:10.00 1st Qu.:10.00 1st Qu.: 5.000 1st Qu.:105.0
Median : 45.00 Median :20.00 Median :15.00 Median : 5.000 Median :165.0
SINGLE_PA1 SINGLE_PA2 SINGLE_PA3 SINGLE_PA4 SINGLE_PA5
Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 75 1st Qu.: 45.00 1st Qu.: 20.00 1st Qu.: 5.00 1st Qu.:25.00
Median :135 Median : 75.00 Median : 35.00 Median : 15.00 Median :40.00
SINGLE_PA6 SINGLE_PA7 SINGLE_PA8 CHILDREN_F CHILDREN_1
Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0 Min. : 0
1st Qu.:15.00 1st Qu.: 5.000 1st Qu.: 0.000 1st Qu.: 805 1st Qu.: 140
Median :25.00 Median :10.000 Median : 5.000 Median :1195 Median : 210
CHILDREN_2 CHILDREN_3 CHILDREN_4 CHILDREN_5 POPULATIO1
Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0
1st Qu.: 240.0 1st Qu.: 95 1st Qu.:180.0 1st Qu.:115.0 1st Qu.: 2755
Median : 375.0 Median :140 Median :290.0 Median :160.0 Median : 3865
POPULATIO2 POPULATIO3 POPULATIO4 POPULATIO5 POPULATIO6
Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0 Min. : 0
1st Qu.: 295.0 1st Qu.: 50.00 1st Qu.: 50.00 1st Qu.: 170 1st Qu.: 2175
Median : 425.0 Median : 70.00 Median : 70.00 Median : 295 Median : 3340
POPULATIO7 POPULATIO8 POPULATIO9 POPULATI10 POPULATI11
Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.000 Min. : 0.0
1st Qu.: 390.0 1st Qu.:105.0 1st Qu.:20.00 1st Qu.: 5.000 1st Qu.: 70.0
Median : 545.0 Median :150.0 Median :30.00 Median :10.000 Median :110.0
POPULATI12 PRIVATE_HO PRIVATE_HH PRIVATE_H1 PRIVATE_H2
Min. : 0.0 Min. : 0 Min. : 0 Min. : 0 Min. : 0
1st Qu.: 260.0 1st Qu.:1170 1st Qu.: 750 1st Qu.: 670 1st Qu.: 550
Median : 385.0 Median :1435 Median :1075 Median :1015 Median : 885
PRIVATE_H3 PRIVATE_H4 PRIVATE_H5 PRIVATE_H6 PRIVATE_H7
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.: 260.0 1st Qu.: 275.0 1st Qu.: 80.0 1st Qu.: 55.00 1st Qu.: 40.00
Median : 385.0 Median : 435.0 Median :145.0 Median : 90.00 Median : 60.00
PRIVATE_H8 PRIVATE_H9 PRIVATE_10 PRIVATE_11 PRIVATE_12
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 25.00 1st Qu.:10.00 1st Qu.: 15.00 1st Qu.:10.00 1st Qu.: 15.00
Median : 40.00 Median :15.00 Median : 25.00 Median :15.00 Median : 30.00
PRIVATE_13 PRIVATE_14 PRIVATE_15 OCC_PRIVAT OCC_PRIVA1
Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 0
1st Qu.: 190.0 1st Qu.: 170.0 1st Qu.: 20.00 1st Qu.:1170 1st Qu.: 615
Median : 330.0 Median : 300.0 Median : 30.00 Median :1435 Median : 945
OCC_PRIVA2 OCC_PRIVA3 OCC_PRIVA4 OCC_PRIVA5
Min. : 0.0 Min. : 0.000 Min. : 0.0 Min. : 0.00
1st Qu.: 0.0 1st Qu.: 0.000 1st Qu.: 100.0 1st Qu.: 5.00
Median : 0.0 Median : 0.000 Median : 365.0 Median : 15.00
OCC_PRIVA6 OCC_PRIVA7 OCC_PRIVA8 OCC_PRIVA9 PRIVATE_16
Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0
1st Qu.: 0.0 1st Qu.: 10.00 1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.:1170
Median : 85.0 Median : 20.00 Median : 45.00 Median : 0.000 Median :1435
PRIVATE_17 PRIVATE_18 PRIVATE_19 PRIVATE_20 PRIVATE_21
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.: 175.0 1st Qu.: 380.0 1st Qu.:170.0 1st Qu.:145.0 1st Qu.: 50.0
Median : 300.0 Median : 470.0 Median :230.0 Median :225.0 Median : 80.0
PRIVATE_22 PRIVATE_23 NATIVE_LAN NATIVE_LA1 NATIVE_LA2
Min. : 0.00 Min. : 0 Min. : 0 Min. : 0 Min. : 0
1st Qu.: 25.00 1st Qu.: 2755 1st Qu.: 2755 1st Qu.: 2725 1st Qu.:2120
Median : 40.00 Median : 3865 Median : 3865 Median : 3840 Median :3035
NATIVE_LA3 NATIVE_LA4 NATIVE_LA5 NATIVE_LA6 NATIVE_LA7
Min. : 0.00 Min. : 0.0 Min. :0.0000 Min. :0 Min. :0
1st Qu.: 30.00 1st Qu.: 380.0 1st Qu.:0.0000 1st Qu.:0 1st Qu.:0
Median : 50.00 Median : 645.0 Median :0.0000 Median :0 Median :0
NATIVE_LA8 NATIVE_LA9 NATIVE_L10 NATIVE_L11 NATIVE_L12 NATIVE_L13
Min. :0 Min. :0 Min. :0 Min. :0 Min. :0.00000 Min. :0
1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0.00000 1st Qu.:0
Median :0 Median :0 Median :0 Median :0 Median :0.00000 Median :0
NATIVE_L14 NATIVE_L15 NATIVE_L16 NATIVE_L17 NATIVE_L18
Min. :0 Min. : 0.0 Min. : 0.0000 Min. : 0.000 Min. : 0.000
1st Qu.:0 1st Qu.: 375.0 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000
Median :0 Median : 635.0 Median : 0.0000 Median : 0.000 Median : 0.000
NATIVE_L19 NATIVE_L20 NATIVE_L21 NATIVE_L22
Min. : 0.000 Min. : 0.0000 Min. : 0.00 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 5.00 1st Qu.: 0.000
Median : 0.000 Median : 0.0000 Median : 15.00 Median : 0.000
NATIVE_L23 NATIVE_L24 NATIVE_L25 NATIVE_L26
Min. : 0.000 Min. : 0.000 Min. :0.00000 Min. : 0.0000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.: 0.0000
Median : 0.000 Median : 0.000 Median :0.00000 Median : 0.0000
NATIVE_L27 NATIVE_L28 NATIVE_L29 NATIVE_L30
Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000
Median : 0.000 Median : 0.000 Median : 0.0000 Median : 5.000
NATIVE_L31 NATIVE_L32 NATIVE_L33 NATIVE_L34
Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.: 5.00 1st Qu.: 0.000
Median : 10.00 Median : 0.000 Median : 15.00 Median : 5.000
NATIVE_L35 NATIVE_L36 NATIVE_L37 NATIVE_L38
Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 10.00 1st Qu.: 0.000 1st Qu.: 0.000
Median : 0.000 Median : 25.00 Median : 0.000 Median : 0.000
NATIVE_L39 NATIVE_L40 NATIVE_L41 NATIVE_L42
Min. : 0.0000 Min. :0.00000 Min. : 0.00 Min. : 0.00
1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 25.00 1st Qu.: 5.00
Median : 0.0000 Median :0.00000 Median : 40.00 Median :10.00
NATIVE_L43 NATIVE_L44 NATIVE_L45 NATIVE_L46
Min. : 0.000 Min. :0.00000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 0.000
Median : 5.000 Median :0.00000 Median : 0.000 Median : 5.000
NATIVE_L47 NATIVE_L48 NATIVE_L49 NATIVE_L50 NATIVE_L51
Min. : 0.00 Min. : 0.0000 Min. : 0.0000 Min. : 0.0 Min. : 0.000
1st Qu.:10.00 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 35.0 1st Qu.: 0.000
Median :20.00 Median : 0.0000 Median : 0.0000 Median : 70.0 Median : 0.000
NATIVE_L52 NATIVE_L53 NATIVE_L54 NATIVE_L55 NATIVE_L56
Min. : 0.000 Min. : 0.00 Min. : 0.0 Min. : 0.000 Min. : 0.00
1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.000 1st Qu.: 0.00
Median : 0.000 Median : 5.00 Median : 0.0 Median : 0.000 Median : 0.00
NATIVE_L57 NATIVE_L58 NATIVE_L59
Min. :0.0000 Min. : 0.000 Min. : 0.000
1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000
Median :0.0000 Median : 5.000 Median : 0.000
[ reached getOption("max.print") -- omitted 4 rows ]
Every phenomena can be measured at a location (ask yourself, what exists outside of space?).
In point pattern analysis, the unit of support is the point, and the source of randomness is the location itself. Many other forms of data are also collected at points. For instance, when the census collects information on population, at its most basic, the information can be georeferenced to an address, that is, a point.
In numerous applications, however, data are not reported at their fundamental unit of support, but rather are aggregated to some other geometry, for instance an area. This is done for several reasons, including the privacy and confidentiality of the data. Instead of reporting individual-level information, the information is reported for zoning systems that often are devised without consideration to any underlying social, natural, or economic processes.
Census data, for instance, is reported at different levels of geography. In Canada, the smallest publicly available geography is called a Dissemination Area or DA. A DA in Canada contains a population between 400 and 700 persons. Thus, instead of reporting that one person (or more) are located at a point (i.e., an address), the census reports the population for the DA. Other data are aggregated in similar ways (income, residential status, etc.)
At the highest level of aggregation, national level statistics are reported, for instance Gross Domestic Product, or GDP. Economic production is not evenly distributed across space; however, the national GDP does not distinguish regional variations in this process.
Ideally, a data analyst would work with data in its most fundamental support. This is not alway possible, and therefore many techniques have been developed to work with data that have been agregated to zones.
When working with areas, it is less practical to identify the area with the coordinates (as we did with points). After all, areas will be composed of lines and reporting all the relevant coordinates is impractical. Sometimes the geometric centroids of the areas are used instead.
More commonly, areas are assigned an index or unique identifier, so that a region will typically consist of a set of \(n\) areas as follows: \[ R = A_1 \cup A_2 \cup A_3 \cup ...\cup A_n. \]
The above is read as “the Region R is the union of Areas 1 to n”.
Regions can have a set of \(k\) attributes or variables associated with them, for instance: \[ \textbf{X}_i=[x_{i1}, x_{i2}, x_{i3},...,x_{ik}] \]
These attributes will typically be counts (e.g., number of people in a DA), or some summary measure of the underlying data (e.g., mean commute time).
Imagine that data on income by household were collected as follows:
df <- data.frame(x = c(0.3, 0.4, 0.5, 0.6, 0.7), y = c(0.1, 0.4, 0.2, 0.5, 0.3), Income = c(30000, 30000, 100000, 100000, 100000))
Households are geocoded as points with coordinates x and y, whereas income is in dollars.
Plot the income as points (hover over the points to see the attributes):
p <- ggplot(data = df, aes(x = x, y = y, color = Income)) +
geom_point(shape = 17, size = 5) +
coord_fixed()
ggplotly(p)
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
The underlying process is one of income sorting, with lower incomes to the west, and higher incomes to the east. This could be due to a geographical feature of the landscape (for instance, an escarpment), or the distribution of the housing stock (with a neighborhood that has more expensive houses). These are examples of a variable that responds to a common environmental factor. As an alternative, people may display a preference towards being near others that are similar to them (this is called homophily). When this happens, the variable responds to itself in space.
The quality of similarity or disimilarity between neighboring observations of the same variable in space is called spatial autocorrelation. You will learn more about this later on.
Another reason why variables reported for areas could display similarities in space is as an consequence of the zoning system.
Suppose for a moment that the data above can only be reported at the zonal level, perhaps because of privacy and confidentiality concerns. Thanks to the great talent of the designers of the zoning system (or a felicitous coincidence!), the zoning system is such that it is consistent with the underlying process of sorting. The zones, therefore, are as follows:
zones1 <- data.frame(x1=c(0.2, 0.45), x2=c(0.45, 0.80), y1=c(0.0, 0.0), y2=c(0.6, 0.6), Zone_ID = c('1','2'))
If you add these zones to the plot:
p <- ggplot() +
geom_rect(data = zones1, mapping = aes(xmin = x1, xmax = x2, ymin = y1, ymax = y2, fill = Zone_ID), alpha = 0.3) +
geom_point(data = df, aes(x = x, y = y, color = Income), shape = 17, size = 5) +
coord_fixed()
ggplotly(p)
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
What is the mean income in zone 1? What is the mean income in zone 2? Not only are the summary measures of income highly representative of the observations they describe, the two zones are also highly distinct.
Imagine now that for whatever reason (lack of prior knowledge of the process, convenience for data collection, etc.) the zones instead are as follows:
zones2 <- data.frame(x1=c(0.2, 0.55), x2=c(0.55, 0.80), y1=c(0.0, 0.0), y2=c(0.6, 0.6), Zone_ID = c('1','2'))
If you plot these zones:
p <- ggplot() +
geom_rect(data = zones2, mapping = aes(xmin = x1, xmax = x2, ymin = y1, ymax = y2, fill = Zone_ID), alpha = 0.3) +
geom_point(data = df, aes(x = x, y = y, color = Income), shape = 17, size = 5) +
coord_fixed()
ggplotly(p)
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
What is now the mean income of zone 1? What is the mean income of zone 2? The observations have not changed, and the generating spatial process remains the same. You will notice, however, that the summary measures for the two zones are more similar in this case than they were when the zones more closely captured the underlying process.
The initial step when working with spatial area data, perhaps, is to visualize the data.
Commonly, area data are visualized by means of choropleth maps. A choropleth map is a map of the polygons that form the areas in the region, each colored in a way to represent the value of an underlying variable.
Lets use ggplot2 to create a choropleth map of population in Hamilton. Notice that the fill color for the polygons is given by cutting the values of POPULATION in five equal segments. In other words, the colors represent zones in the bottom 20% of population, zones in the next 20%, and so on, so that the darkest zones are those with populations so large as to be in the top 20% of the population distribution:
ggplot() + geom_polygon(data = Hamilton_CT.t, aes(x = long, y = lat, group = group, fill = cut_number(POPULATION, 5)),color = NA, size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Population")
Inspecting the map above, would you say that the distribution of population is random, or not random? If not random, what do you think might be an underlying process for the distribution of population.
Often, creating a choropleth map using the absolute value of a variable can be somewhat misleading. As seen in the map above, the zones with the largest population are also usually large zones. Any process that you might think of will be confounded by the size of the zones. For this reason, it is often more informative when creating a choropleth map to use a variable that is a rate, for instance population divided by area to give population density:
pop_den.map <- ggplot() + geom_polygon(data = Hamilton_CT.t, aes(x = long, y = lat, group = group, fill = cut_number(POPULATION/AREA, 5)),color = "white", size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Pop Density")
pop_den.map
It can be seen now that the population density is higher in the more central parts of Hamilton, Burlington, Dundas, etc. Does the map look random? If not, what might be an underlying process that explains the variations in population density in a city like Hamilton?
Other times, it is appropriate to standardize instead of by area, by what might be called the population at risk. For instance, lets say that we wanted to explore the distribution of the population of older adults (say, 65 and older). In this case, normalizing not by area, but by the total population, would remove the “size” effect, giving a proportion:
ggplot() + geom_polygon(data = Hamilton_CT.t, aes(x = long, y = lat, group = group, fill = cut_number((AGE_65_TO_ + AGE_70_TO_ + AGE_75_TO_ + AGE_80_TO_ + AGE_85)/POPULATION, 5)),color = NA, size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Prop Age 65+")
Do you notice a pattern in the distribution of seniors in the Hamilton, CMA?
There are a few things to keep in mind when creating choroplet maps.
First, what classification scheme to use, with how many classes, and what colors?
The examples above were all created using a classification scheme based on the quintiles of the distribution. As noted above, these are obtained by dividing the sample into 5 equal parts to give bottom 20%, etc., of observations. The quintiles are a particular form of a statistical measure known as quantiles, of which the median is value obtained when the sample is divided in two equal sized parts. Other classification schemes may include the mean, standard deviation, and so on.
In terms of how many classes to use, often there is little point in using more than six or seven classes, because the human eye cannot distinguish color differences at a much higher resolution.
The colors are a matter of style, but there are coloring schemes that are colorblind safe (see here).
Secondly, when the zoning system is irregular (as opposed to, say, a raster), large zones can easily become dominant. In effect, much detail in the maps above is lost for small zones, whereas large zones, especially if similarly colored, may mislead the eye as to their relative frequency.
Another mapping technique, the cartogram, is meant to reduce the issues with small-large zones.
A cartogram is a map where the size of the zones is adjusted so that instead of being the land area, it is proportional to some other variable of interest.
Lets illustrate the idea behind the cartogram here.
In the maps above, the zones are faithful to their geographical properties. Unfortunately, this obscured the relevance of small zones. A cartogram can be weighted by another variable, say for instance, the population. In this way, the size of the zones will depend on the total population.
Cartograms are implemented in R in the package cartogram.
CT_pop_cartogram <- cartogram(shp = Hamilton_CT, weight = "POPULATION")
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 1: 5.94063097656517
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 2: 4.22282836609772
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 3: 3.24121918122005
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 4: 2.70487592121738
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 5: 2.44050500280525
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 6: 2.28238579460187
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 7: 2.14591026585647
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 8: 1.95255176465899
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 9: 1.8186943555214
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 10: 1.73775203417466
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 11: 1.64845060853967
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 12: 1.45162213155545
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 13: 1.37178691507706
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 14: 1.32738198642848
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 15: 1.29727221750122
Notice that the value of the function cartogram (i.e., its output) is a SpatialPolygonsDataFrame. This object needs to be tidied if we wish to use ggplot2 to visualize it:
CT_pop_cartogram.t <- tidy(CT_pop_cartogram, region = "TRACT")
CT_pop_cartogram.t <- rename(CT_pop_cartogram.t, TRACT = id)
As before, the data were stripped from the tidied version of the dataframe, so they need to be restored:
CT_pop_cartogram.t <- left_join(CT_pop_cartogram.t, CT_pop_cartogram@data, by = "TRACT")
Column `TRACT` joining character vector and factor, coercing into character vector
Plotting the cartogram:
ggplot() + geom_polygon(data = CT_pop_cartogram.t, aes(x = long, y = lat, group = group, fill = cut_number(POPULATION, 5)), color = "white", size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Population")
Notice how the size of the zones has been adjusted.
The cartogram can be combined with coloring schemes, as in choropleth maps:
CT_popden_cartogram <- cartogram(Hamilton_CT, weight = "POP_DENSIT")
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 1: 29.0772344366743
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 2: 26.933147928614
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 3: 25.1986261892098
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 4: 23.7327939549969
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 5: 22.4463390026744
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 6: 21.2818852763495
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 7: 20.2025834378354
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 8: 19.1839656200552
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 9: 18.2099242308777
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 10: 17.2709049215263
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 11: 16.3609019577218
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 12: 15.4772258019554
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 13: 14.619308950628
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 14: 13.7881685736231
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 15: 12.9857607680246
Tidy and restore the data:
CT_popden_cartogram.t <- tidy(CT_popden_cartogram, region = "TRACT")
CT_popden_cartogram.t <- rename(CT_popden_cartogram.t, TRACT = id)
CT_popden_cartogram.t <- left_join(CT_popden_cartogram.t, CT_popden_cartogram@data, by = "TRACT")
Column `TRACT` joining character vector and factor, coercing into character vector
pop_den.cartogram <- ggplot() + geom_polygon(data = CT_popden_cartogram.t, aes(x = long, y = lat, group = group, fill = cut_number(POP_DENSIT, 5)),color = "white", size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Pop Density")
pop_den.cartogram
By combining a cartogram with choropleth mapping, it becomes easier to appreciate the way high population density is concentrated in the central parts of Hamilton, Burlington, etc.
grid.arrange(pop_den.map, pop_den.cartogram, nrow = 1)
This concludes Practice 9.